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Abstract — Information Dispersal Algorithms (IDAs) have been 
widely applied to reliable and secure storage and transmission 
of data files in distributed systems. An IDA is a method that 
encodes a file F of size L = \F\ into n unrecognizable pieces 
Fi, F2, ■ ■ ■ , F„, each of size L/m (m < n), so that the original 
file F can be reconstructed from any m pieces. The core of an 
IDA is the adopted non-systematic m-of-n erasure code. This 
paper makes a systematic study on the confidentiality of an IDA 
and its connection with the adopted erasure code. Two levels 
of confidentiality are defined: weak confidentiality (in the case 
where some parts of the original file F can be reconstructed 
explicitly from fewer than m pieces) and strong confidentiality (in 
the case where nothing of the original file F can be reconstructed 
explicitly from fewer than m pieces). For an IDA that adopts an 
arbitrary non-systematic erasure code, its confidentiality may fall 
into weak confidentiality. To achieve strong confidentiality, this 
paper explores a sufficient and feasible condition on the adopted 
erasure code. Then, this paper shows that Rabin's IDA has 
strong confidentiality. At the same time, this paper presents an 
effective way to construct an IDA with strong confidentiality from 
an arbitrary m-of-(m + n) erasure code. Then, as an example, 
this paper constructs an IDA with strong confidentiality from 
a Reed-Solomon code, the computation complexity of which is 
comparable to or sometimes even lower than that of Rabin's 
IDA. 

Index Terms — Cauchy matrix, confidentiality, erasure code, 
information dispersal algorithm, Reed-Solomon code, Vander- 
monde matrix. 



I. Introduction 

In 1989, Rabin |1| proposed an attractive Information Dis- 
persal Algorithm (IDA) that is applicable to reliable and secure 
storage and transmission of data files in distributed systems. 
Since then, IDAs have drawn many attentions from both 
researchers and engineers in the area of distributed systems. 

An IDA is a method that encodes a file F of size L = \F\ 
into n unrecognizable pieces F%, F%, F n , each of size 
L/m (m < n), so that the original file F can be reconstructed 
from any m pieces. From a coding theorist's viewpoint, an 
IDA is corresponding to a non-systematic m-of-n erasure code 
0. Here, the non-systematic property of the erasure code is 
necessary to ensure "unrecognizable" pieces. In practice, an 
IDA is implemented as follows: The original file F is firstly 
divided into m segments S\, S%, ■ ■ ■ , S m , each of size L/m. 
Then, the m segments are encoded into n unrecognizable 
pieces F\, F2, ■ ■ ■ , F n using a non-systematic m-of-n erasure 
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code. 

The reliability of an IDA is clear: no more than n — m 
lost pieces of the n pieces Fx, F2, ■ ■ ■ , F n will not result 
in data loss. However, the confidentiality of an IDA is not 
straightforward and deserves a systematic study. 

From the view of information-theoretic security |3|, IDAs 
can provide only incremental confidentiality and thus have 
weaker confidentiality than secret sharing (with perfect con- 
fidentiality) [4 1 — [6;| and ramp schemes (with partially per- 
fect confidentiality) (7), (8). However, IDAs can achieve 
optimal efficiency in data overhead [ 1 1 . As shown in (9] 
Page 65, Table A], there is a trade-off between confidential- 
ity and data overhead. Moreover, for practical applications, 
information-theoretic security is often extravagant and unnec- 
essary. Thus, in this paper, we will study the practical security 
that IDAs can provide. 

Although a non-systematic erasure code can ensure "un- 
recognizable" pieces in an IDA, some segments may still be 
reconstructed explicitly from fewer than m pieces. Then, an 
eavesdropper who acquires fewer than m pieces by snooping 
may reconstruct some parts of the original file F explicitly, 
resulting in partial file leakage. In the case of partial file 
leakage, we say the IDA has weak confidentiality. However, 
for an ideal IDA, any segment of the original file F should not 
be reconstructed explicitly from fewer than m pieces. In the 
case where nothing of the original file F can be reconstructed 
explicitly from fewer than m pieces, we say the IDA has strong 
confidentiality. 

For an IDA that adopts an arbitrary non-systematic erasure 
code, we noticed that its confidentiality may fall into weak 
confidentiality. In this paper, we will first show in Section [TIT] 
which kind of IDAs has weak confidentiality and how an 
eavesdropper can reconstruct some segments of the original 
file F explicitly from fewer than m pieces in the case of 
weak confidentiality. Then, to achieve strong confidentiality, 
we explore a sufficient and feasible condition for an IDA 
in Section [IV] We show that Rabin's IDA JT] has strong 
confidentiality. At the same time, we present an effective 
way to consttuct an IDA with strong confidentiality from an 
arbitrary m-of-(m + n) erasure code. Then, as an example, 
we construct an IDA with strong confidentiality from a Reed- 
Solomon code 1 10], the computation complexity of which is 
comparable to or sometimes even lower than that of Rabin's 
IDA. Finally, we conclude this paper in Section [V] To our 
knowledge, this paper is the first work that focuses on the 
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issues of weak confidentiality and strong confidentiality in 
IDAs. 

To make our later discussion more easily understood, we 
begin this paper with a brief introduction of IDAs and their 
erasure codes. 

II. IDAs and Their Erasure Codes 

In an Information Dispersal Algorithm (IDA), a 
non-systematic m-of-n erasure code is employed to encode 
the m segments S\, S 2 , • • • , S m into n unrecognizable pieces 
F lt F 2 , ■■■,F n ,i.e. 



(Si, S 2 , • • ■ , S m ) ■ G ri 



(F!,F 2 



,F n ), (1) 



where G mxn is the generator matrix of the adopted erasure 
code and meets the following two conditions: 

1) Any column of G mxn is not equal to any column of an 
to x to identity matrix; and 

2) Any to columns of G mxn form an m x m nonsingular 
matrix. 

The first condition ensures that any piece is unrecognizable; 
while the second condition ensures that the original file F can 
be reconstructed from any to pieces. 

III. Which Kind of IDAs Has Weak Confidentiality 

In this section, we will show which kind of Information 
Dispersal Algorithms (IDAs) have weak confidentiality and 
how an eavesdropper can reconstruct some segments of the 
original file F explicitly from fewer than m pieces in the case 
of weak confidentiality. We present a theorem as follows: 

Theorem 3.1: An IDA has weak confidentiality if and only 
if the adopted erasure code meets the following condition: In 
its generator matrix G mxn , there is a submatrix A m 'xn' of 
column rank r, where to', n' < m and n! — r = to — to' > 0. 

Proof: We first prove the sufficiency. Suppose A m r xn > 
is located in rows i\, i 2 , ■ ■ ■ , i m < and columns ji, j 2 , ■ ■ ■ , j n ' 
of G mxn . Then, S^, S{ 2 ,- ■ ■ , Si , are the m' segments 
corresponding to rows ii,i 2 ,--- ,i m ' of G mxn . Similarly, 
Fj 1 , Fj 2 , • • • , Fj , are the n' pieces corresponding to columns 
ji,j2,--- ,j n > of G m xn- Since n' - r = m - ml > 0, 
r = to' + nf — to. Furthermore, since m',n' < m, then 
r < m',n'. Thus, A m / Xn > is rank deficient. Then, in A m i xn >, 
any k — n' — r columns can be linearly represented by 
other r columns. Let A m >xn' — (vi,v 2 ,--- ,v n >), where 
V\,v 2) - ■ ■ ,v n i are column vectors. Suppose there is a linear 
relation among column vectors of A m > xn > as follows: 

(v 1 ,v 2 , ■■■ ,v k ) = {v k +i,v k+2 , ■ ■ ■ ,v n i) ■ B rxk , 

where B rxk is the transpose of coefficient matrix. Then, 
any information of Si i: Si 2 , ■ ■ ■ , Si , can be eliminated by 
calculating 



rxk- 



(2) 



(-^ii ' Fj 2 , • • • , Fj k ) — (Fj 1 , Fj 2 , • • • , Fj k ) 

— (Fjk+i 1 Fjk+2 1 ' ' ' > Fj n , ) ■ B 

Finally, according to Equation ([TJ, other to — to' = k segments 
except Si ± ,Si 2 ,--- , Si , can be reconstructed explicitly. 
We now prove the necessity. If some segments can be 



reconstructed explicitly from fewer than to pieces in an IDA, 
it is clear that the information of other segments should be 
able to be eliminated from the eavesdropped pieces by linear 
operations. Moreover, a solvable system of linear equations on 
these reconstructible segments should be able to be formed. 
From the above proof of sufficiency, we can deduce that in 
the corresponding generator matrix G mxn , there should be a 
submatrix A m r xn > of column rank r, where m',n' < to and 
n' — r = m — to' > 0. ■ 
Remark: From the second condition in the previous section, 
we can deduce a necessary condition n' — r < to — to' 
(otherwise the corresponding n' columns of G mxn will form 
a rank deficient matrix — a contradiction). Thus, for an IDA 
that adopts an arbitrary non-systematic erasure code, its con- 
fidentiality may fall into weak confidentiality. 

IV. Constructing IDAs with Strong 
Confidentiality 

For an Information Dispersal Algorithm (IDA), to achieve 
strong confidentiality, we explore a sufficient and feasible 
condition as follows: 

Theorem 4.1: An IDA has strong confidentiality if the 
adopted erasure code meets the following condition: Any 
square submatrix of its generator matrix G mxn is nonsingular. 

Proof: We prove this theorem by contradiction as fol- 
lows: In this case, suppose this IDA has weak confidentiality. 
According to Theorem 3.1 in G mxn , there is a subma- 
trix A m r xn i of column rank r, where m',n' < to and 
n' — r = to — m! > 0. Then, according to the proof of 
Theorem 3.1 A m ' Xn > is rank deficient. Thus, in A m ' Xn >, any 
mm(m',n) x min(m',n') square submatrix is singular — a 
contradiction! Therefore, this IDA has strong confidentiality. 

■ 

In Rabin's IDA [ 1 1, the corresponding generator matrix is a 
Cauchy matrix, in which any square submatrix is nonsingular. 
Thus, Rabin's IDA has strong confidentiality. 

Inspired by the work in [11] , we now present an effective 
way to construct an IDA with strong confidentiality from an 
arbitrary m-of-(m + n) erasure code as follows: 

1) Choose an arbitrary m-of-(TO + n) erasure code, whose 
generator matrix is G mx(m+n ) = (C mxm \D mxn ); 

2) Construct an IDA that adopts an m-of-n erasure code 
whose generator matrix is C^ nXm ■ D mxn . 

It is easy to verify that the IDA constructed above has strong 
confidentiality as follows: 

1) In the case where C mxm is an to x to identity matrix, 
the chosen m-of-(TO + n) erasure code is a system- 
atic erasure code. Then, according to the nature of a 
systematic TO-of-(TO + n) erasure code |2|, any square 



= c: 



D m xn is nonsingular. 
the constructed IDA 



2) 



submatrix of D n 
Thus, according to Theorem 4.1 
has strong confidentiality. 

In the case where C mxm is not an m x to 
identity matrix, the chosen m-of-(m + n) erasure 
code is a non-systematic erasure code. Then, 
Irr,. xm \C^ Xm -Anxn) is the generator matrix of 
equivalent systematic m-of-(m + n) erasure 



L mxm 

the 
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code. So, any square submatrix of C, 



1 

m x m 



is nonsingular. Thus, according to Theorem 4.1 the 
constructed IDA also has strong confidentiality. 

Example: We construct an IDA with strong confidentiality 
from a Reed-Solomon code [10|, whose generator matrix is a 
Vandermonde matrix. From what we have discussed above, 
we first choose a m-of-(m + n) Reed-Solomon code with 
generator matrix 



G RS - 



/ 



V 



a \ 
n 1 



u rn+n / 



(3) 



where ai, <Z2, ■ ■ ■ , a m + n are distinct. Then, an IDA with strong 
confidentiality can be reconstructed, in which the correspond- 
ing generator matrix is 



G IDA 



/ 



V 



V 



m— 1 
1 


m— 1 
a 2 




-1 




m+1 


a m+2 


■ (1° 


-n 


1 

m+1 


a m+2 




-n 



*m+2 



m+n 



\ 



/ 



(4) 



From the comparison results in p2| , we can deduce that 
the computation complexity of this IDA is comparable to or 
sometimes even lower than that of Rabin's IDA. 

Remark: Besides Cauchy matrices, Vandermonde matrices 
were also suggested for the generator matrices of IDAs in 
Rabin's seminal paper JT] Page 339]. However, a Vandermonde 
matrix defined over a finite field may contain singular square 
submatrices p] Page 323, Problem. (7)]. Then, an IDA whose 
erasure code is a Reed-Solomon code defined over a finite field 



may not meet the condition in Theorem 4. 1 and thus may have 
weak confidentiality. Luckily, in the literature, when Rabin's 
IDA is mentioned, it always refers to that constructed based 
on a Cauchy matrix. 

V. Conclusions 

Information Dispersal Algorithms (IDAs) [1] have been 
widely applied to reliable and secure storage and transmis- 
sion of data files in distributed systems. This paper made 
a systematic study on the confidentiality of IDAs and its 
connection with the adopted erasure codes |2|. Specially, 
this paper studied the confidentiality of IDAs from the view 
of practical security. This paper defined and discussed two 
levels of confidentiality: weak confidentiality (in the case 
where some parts of the original file can be reconstructed 
explicitly from fewer than the threshold number of pieces) 
and strong confidentiality (in the case where nothing of the 
original file can be reconstructed explicitly from fewer than the 



threshold number of pieces). This paper showed which kind 
of IDAs have weak confidentiality and how an eavesdropper 
can reconstruct some segments of the original file explicitly 
from fewer than the threshold number of pieces in the case 
of weak confidentiality (see Theorem 3.1 1. It was noticed that 
for an IDA that adopts an arbitrary non-systematic erasure 
code, its confidentiality may fall into weak confidentiality. To 
achieve strong confidentiality, this paper explored a sufficient 
and feasible condition for an IDA (see Theorem |4.1| i. It was 
showed that Rabin's IDA JTJ has strong confidentiality. At the 
same time, this paper presented an effective way to construct 
an IDA with strong confidentiality. Then, as an example, this 
paper constructed an IDA with strong confidentiality from 
a Reed-Solomon code [10], the computation complexity of 
which is comparable to or sometimes even lower than that of 
Rabin's IDA. 

The key message we want to deliver through this paper is 
that an arbitrary non-systematic erasure code is not enough 
for strong confidentiality in an IDA. When referring to the 
confidentiality of an IDA in a practical application, we should 
keep this point in mind! 
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